Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize BigInteger.Parse #97589

Merged
merged 25 commits into from
Jul 3, 2024
Merged

Conversation

kzrnm
Copy link
Contributor

@kzrnm kzrnm commented Jan 27, 2024

Optimize Multiplier

BigInteger.Parse treats $10^{9\times2^n}$ as a multiplier.
The leading $9\times2^n/32$ items of the array representation of $10^{9\times2^n}$ is zero, so the number of arrays for multiplication can be reduced.

For example, the array represention of $10^{9\times2^8} = 10^{2304}$ has 957 items, of which the leading 288 items are 0.

Parse powers of ten

BigInteger.Parse("1" + new string('0', n)) is too slow. It runs in $O(n^2)$.
This is because trailing zeros are always calculated naively.
I fixed the behavior.

Benchmark

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Configs;
using System.Numerics;
using System.Runtime.InteropServices;

BenchmarkSwitcher.FromAssembly(typeof(ParseTest).Assembly).Run(args);

[DisassemblyDiagnoser]
[GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByMethod)]
public class ParseTest
{
    [Params(1000, 10000, 100000, 1000000)]
    public int N;

    string s1, s2, s3;

    [GlobalSetup]
    public void Setup()
    {
        s1 = "1" + new string('0', N);
        s2 = "1" + new string('0', N - 1) + "1";
        s3 = new string('9', N);
    }

    [Benchmark] public BigInteger PowerOfTen() => BigInteger.Parse(s1);
    [Benchmark] public BigInteger PowerOfTenPlusOne() => BigInteger.Parse(s2);
    [Benchmark] public BigInteger PowerOfTenMinusOne() => BigInteger.Parse(s3);
}

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3527/23H2/2023Update/SunValley3)
13th Gen Intel Core i5-13500, 1 CPU, 20 logical and 14 physical cores
.NET SDK 9.0.100-preview.3.24204.13
  [Host]     : .NET 9.0.0 (9.0.24.17209), X64 RyuJIT AVX2
  Job-DWSMWG : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-ECQBBV : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2


Method Job Toolchain N Mean Error StdDev Ratio RatioSD Code Size Gen0 Allocated Alloc Ratio
PowerOfTen Job-DWSMWG \main\corerun.exe 1000 5.022 μs 0.0594 μs 0.0496 μs 1.00 0.00 2,909 B 0.0305 440 B 1.00
PowerOfTen Job-ECQBBV \pr\corerun.exe 1000 5.020 μs 0.0969 μs 0.0859 μs 1.00 0.01 2,909 B 0.0305 440 B 1.00
PowerOfTen Job-DWSMWG \main\corerun.exe 10000 404.471 μs 7.6946 μs 7.1975 μs 1.00 0.00 2,902 B - 4187 B 1.00
PowerOfTen Job-ECQBBV \pr\corerun.exe 10000 68.792 μs 1.3603 μs 1.3970 μs 0.17 0.00 2,877 B 0.2441 4185 B 1.00
PowerOfTen Job-DWSMWG \main\corerun.exe 100000 39,121.600 μs 246.3455 μs 205.7096 μs 1.00 0.00 2,901 B - 41612 B 1.00
PowerOfTen Job-ECQBBV \pr\corerun.exe 100000 2,638.464 μs 14.9664 μs 13.2673 μs 0.07 0.00 2,877 B - 41579 B 1.00
PowerOfTen Job-DWSMWG \main\corerun.exe 1000000 4,009,495.647 μs 17,321.2555 μs 16,202.3132 μs 1.00 0.00 94 B - 439056 B 1.00
PowerOfTen Job-ECQBBV \pr\corerun.exe 1000000 97,940.946 μs 487.5769 μs 432.2242 μs 0.02 0.00 2,869 B - 416352 B 0.95
PowerOfTenPlusOne Job-DWSMWG \main\corerun.exe 1000 5.881 μs 0.0758 μs 0.0592 μs 1.00 0.00 2,909 B 0.0305 440 B 1.00
PowerOfTenPlusOne Job-ECQBBV \pr\corerun.exe 1000 5.841 μs 0.0784 μs 0.0733 μs 0.99 0.01 2,877 B 0.0305 440 B 1.00
PowerOfTenPlusOne Job-DWSMWG \main\corerun.exe 10000 278.657 μs 1.8829 μs 1.5723 μs 1.00 0.00 2,877 B 0.4883 8659 B 1.00
PowerOfTenPlusOne Job-ECQBBV \pr\corerun.exe 10000 80.380 μs 1.5721 μs 1.4705 μs 0.29 0.00 2,880 B 0.2441 4184 B 0.48
PowerOfTenPlusOne Job-DWSMWG \main\corerun.exe 100000 9,978.399 μs 84.3427 μs 78.8943 μs 1.00 0.00 2,872 B - 86136 B 1.00
PowerOfTenPlusOne Job-ECQBBV \pr\corerun.exe 100000 3,041.650 μs 13.5062 μs 11.2783 μs 0.30 0.00 2,870 B - 41579 B 0.48
PowerOfTenPlusOne Job-DWSMWG \main\corerun.exe 1000000 370,703.885 μs 3,617.1873 μs 3,020.5152 μs 1.00 0.00 94 B - 866608 B 1.00
PowerOfTenPlusOne Job-ECQBBV \pr\corerun.exe 1000000 112,643.729 μs 870.3710 μs 814.1455 μs 0.30 0.00 2,872 B - 416568 B 0.48
PowerOfTenMinusOne Job-DWSMWG \main\corerun.exe 1000 5.978 μs 0.0465 μs 0.0412 μs 1.00 0.00 2,909 B 0.0305 440 B 1.00
PowerOfTenMinusOne Job-ECQBBV \pr\corerun.exe 1000 6.055 μs 0.1183 μs 0.1265 μs 1.01 0.03 2,877 B 0.0305 440 B 1.00
PowerOfTenMinusOne Job-DWSMWG \main\corerun.exe 10000 279.346 μs 2.8043 μs 2.3417 μs 1.00 0.00 2,877 B 0.4883 8659 B 1.00
PowerOfTenMinusOne Job-ECQBBV \pr\corerun.exe 10000 208.960 μs 1.3014 μs 1.2173 μs 0.75 0.01 2,877 B 0.2441 4186 B 0.48
PowerOfTenMinusOne Job-DWSMWG \main\corerun.exe 100000 10,202.449 μs 58.1694 μs 54.4117 μs 1.00 0.00 2,872 B - 86042 B 1.00
PowerOfTenMinusOne Job-ECQBBV \pr\corerun.exe 100000 7,508.013 μs 72.8591 μs 64.5877 μs 0.74 0.01 2,870 B - 41613 B 0.48
PowerOfTenMinusOne Job-DWSMWG \main\corerun.exe 1000000 367,924.436 μs 2,814.6511 μs 2,495.1149 μs 1.00 0.00 94 B - 866560 B 1.00
PowerOfTenMinusOne Job-ECQBBV \pr\corerun.exe 1000000 260,532.370 μs 4,869.6777 μs 4,555.0995 μs 0.71 0.01 94 B - 415736 B 0.48

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jan 27, 2024
@ghost
Copy link

ghost commented Jan 27, 2024

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

BigInteger.Parse("1" + new string('0', n)) is too slow. It runs in $O(n^2)$.
This is because trailing zeros are always calculated naively.
I fixed the behavior.

Benchmark

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Numerics;
using System.Runtime.InteropServices;

BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);

[DisassemblyDiagnoser]
public class Tests
{
    [Params(1000, 10000, 100000, 1000000)]
    public int N;

    string s1, s2, s3;

    [GlobalSetup]
    public void Setup()
    {
        s1 = "1" + new string('0', N);
        s2 = "1" + new string('0', N - 1) + "1";
        s3 = new string('9', N - 1);
    }

    [Benchmark]
    public BigInteger PowerOfTen() => BigInteger.Parse(s1);

    [Benchmark(Baseline = true)]
    public BigInteger PowerOfTenPlusOne() => BigInteger.Parse(s2);

    [Benchmark]
    public BigInteger PowerOfTenMinusOne() => BigInteger.Parse(s3);
}

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3085/23H2/2023Update/SunValley3)
13th Gen Intel Core i5-13500, 1 CPU, 20 logical and 14 physical cores
.NET SDK 9.0.100-alpha.1.23615.4
  [Host]     : .NET 9.0.0 (9.0.23.61410), X64 RyuJIT AVX2
  Job-KEBJOQ : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-VLAEBK : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2


Method Job Toolchain N Mean Error StdDev Ratio RatioSD Code Size Gen0 Allocated Alloc Ratio
PowerOfTen Job-KEBJOQ \main\corerun.exe 1000 5.464 μs 0.0769 μs 0.0720 μs 0.88 0.01 3,216 B 0.0305 440 B 1.00
PowerOfTenPlusOne Job-KEBJOQ \main\corerun.exe 1000 6.200 μs 0.0829 μs 0.0775 μs 1.00 0.00 3,216 B 0.0305 440 B 1.00
PowerOfTenMinusOne Job-KEBJOQ \main\corerun.exe 1000 5.965 μs 0.0475 μs 0.0444 μs 0.96 0.01 3,216 B 0.0305 440 B 1.00
PowerOfTen Job-VLAEBK \pr\corerun.exe 1000 5.728 μs 0.0488 μs 0.0456 μs 0.92 0.01 3,309 B 0.0763 976 B 2.22
PowerOfTenPlusOne Job-VLAEBK \pr\corerun.exe 1000 6.172 μs 0.0492 μs 0.0460 μs 1.00 0.01 3,309 B 0.0763 976 B 2.22
PowerOfTenMinusOne Job-VLAEBK \pr\corerun.exe 1000 5.778 μs 0.0388 μs 0.0344 μs 0.93 0.01 3,309 B 0.0763 976 B 2.22
PowerOfTen Job-KEBJOQ \main\corerun.exe 10000 425.126 μs 6.6050 μs 6.1783 μs 0.98 0.02 3,216 B - 4184 B 1.00
PowerOfTenPlusOne Job-KEBJOQ \main\corerun.exe 10000 433.456 μs 6.0261 μs 5.6369 μs 1.00 0.00 3,216 B - 4184 B 1.00
PowerOfTenMinusOne Job-KEBJOQ \main\corerun.exe 10000 432.347 μs 3.8901 μs 3.6388 μs 1.00 0.02 3,216 B - 4177 B 1.00
PowerOfTen Job-VLAEBK \pr\corerun.exe 10000 424.032 μs 6.4220 μs 6.0071 μs 0.98 0.02 3,309 B 0.9766 12400 B 2.96
PowerOfTenPlusOne Job-VLAEBK \pr\corerun.exe 10000 434.967 μs 5.7997 μs 5.4250 μs 1.00 0.02 3,309 B 0.9766 12400 B 2.96
PowerOfTenMinusOne Job-VLAEBK \pr\corerun.exe 10000 429.771 μs 5.5313 μs 5.1740 μs 0.99 0.02 3,305 B 0.9766 12392 B 2.96
PowerOfTen Job-KEBJOQ \main\corerun.exe 100000 41,307.777 μs 380.6849 μs 356.0929 μs 3.67 0.08 3,216 B - 41583 B 0.48
PowerOfTenPlusOne Job-KEBJOQ \main\corerun.exe 100000 11,274.413 μs 215.6506 μs 211.7977 μs 1.00 0.00 2,954 B - 86046 B 1.00
PowerOfTenMinusOne Job-KEBJOQ \main\corerun.exe 100000 11,382.852 μs 214.8872 μs 201.0056 μs 1.01 0.03 2,954 B - 86041 B 1.00
PowerOfTen Job-VLAEBK \pr\corerun.exe 100000 4,972.020 μs 73.2368 μs 68.5057 μs 0.44 0.01 3,047 B 7.8125 107118 B 1.24
PowerOfTenPlusOne Job-VLAEBK \pr\corerun.exe 100000 5,093.083 μs 56.5632 μs 52.9092 μs 0.45 0.01 3,047 B - 41556 B 0.48
PowerOfTenMinusOne Job-VLAEBK \pr\corerun.exe 100000 10,432.147 μs 116.1019 μs 108.6018 μs 0.93 0.02 3,047 B - 41564 B 0.48
PowerOfTen Job-KEBJOQ \main\corerun.exe 1000000 4,151,269.093 μs 43,494.0408 μs 40,684.3528 μs 10.46 0.14 94 B - 415720 B 0.48
PowerOfTenPlusOne Job-KEBJOQ \main\corerun.exe 1000000 396,808.680 μs 6,230.0066 μs 5,827.5520 μs 1.00 0.00 94 B - 860144 B 1.00
PowerOfTenMinusOne Job-KEBJOQ \main\corerun.exe 1000000 395,750.933 μs 7,009.0187 μs 6,556.2404 μs 1.00 0.02 94 B - 860864 B 1.00
PowerOfTen Job-VLAEBK \pr\corerun.exe 1000000 199,223.140 μs 2,441.7662 μs 2,284.0296 μs 0.50 0.01 3,047 B - 939717 B 1.09
PowerOfTenPlusOne Job-VLAEBK \pr\corerun.exe 1000000 196,380.902 μs 2,521.1383 μs 2,358.2743 μs 0.50 0.01 3,047 B - 415517 B 0.48
PowerOfTenMinusOne Job-VLAEBK \pr\corerun.exe 1000000 376,507.469 μs 3,501.7144 μs 2,924.0902 μs 0.95 0.01 94 B - 416008 B 0.48
Author: kzrnm
Assignees: -
Labels:

area-System.Numerics, community-contribution

Milestone: -

@kzrnm
Copy link
Contributor Author

kzrnm commented Jan 29, 2024

I updated the benchmark target to the latest commit.

@tannergooding
Copy link
Member

I updated the benchmark target to the latest commit.

Thanks! Slowly getting through stuff here and just want to ensure no conflicts/accurate numbers/etc as they go through.

@tannergooding
Copy link
Member

Looks like there are some related CI failures that need to get resolved.

@kzrnm
Copy link
Contributor Author

kzrnm commented Jan 31, 2024

Tests fail when parseTest and ToStringTest are run in parallel. I think RunWithFakeThreshold is the cause of failure.
I set DisableParallelization so that RunWithFakeThreshold is not executed in parallel.

@kzrnm
Copy link
Contributor Author

kzrnm commented Feb 1, 2024

I was able to reproduce the error with the following code.

Number.s_naiveThreshold = 0;
BigInteger.Parse("99000000000");

@tannergooding
Copy link
Member

@huoyaoyuan, if you have some time I'd appreciate if you could give this a once over as well, given you've also been touching/updating the parsing logic.

I'd ideally like to get #95402 done first and I know there's another PR out, but you recently changed that one back to draft

@huoyaoyuan
Copy link
Member

I'd ideally like to get #95402 done first and I know there's another PR out, but you recently changed that one back to draft

I've done the code update and measuring the performance. I'd expect to make it ready this weekend.

kzrnm added 6 commits May 6, 2024 03:15
# Conflicts:
#	src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.SquMul.cs
#	src/libraries/System.Runtime.Numerics/tests/BigInteger/multiply.cs
@tannergooding
Copy link
Member

@kzrnm, sorry for the delay on this. Could you share updated numbers, just so we can confirm nothing has regressed given all the other work that's been done?

After that, I think we can get this merged.

@kzrnm
Copy link
Contributor Author

kzrnm commented Jul 2, 2024

@tannergooding I ran benchmarks on the latest branch.


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3810/23H2/2023Update/SunValley3)
13th Gen Intel Core i5-13500, 1 CPU, 20 logical and 14 physical cores
.NET SDK 9.0.100-preview.5.24307.3
  [Host]   : .NET 9.0.0 (9.0.24.30607), X64 RyuJIT AVX2
  ShortRun : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

Job=ShortRun  IterationCount=3  LaunchCount=1  
WarmupCount=3  

Method Toolchain N Mean Error StdDev Ratio RatioSD Gen0 Code Size Allocated Alloc Ratio
PowerOfTen \main\corerun.exe 1000 4.723 μs 0.7022 μs 0.0385 μs 1.00 0.00 0.0305 3,051 B 440 B 1.00
PowerOfTen \pr\corerun.exe 1000 4.727 μs 0.2059 μs 0.0113 μs 1.00 0.01 0.0305 3,051 B 440 B 1.00
PowerOfTen \main\corerun.exe 10000 417.409 μs 85.4963 μs 4.6863 μs 1.00 0.00 - 3,049 B 4185 B 1.00
PowerOfTen \pr\corerun.exe 10000 65.805 μs 2.0472 μs 0.1122 μs 0.16 0.00 0.2441 3,019 B 4184 B 1.00
PowerOfTen \main\corerun.exe 100000 39,509.697 μs 7,036.3549 μs 385.6864 μs 1.00 0.00 - 3,048 B 41634 B 1.00
PowerOfTen \pr\corerun.exe 100000 2,621.521 μs 445.6280 μs 24.4264 μs 0.07 0.00 - 3,016 B 41556 B 1.00
PowerOfTen \main\corerun.exe 1000000 5,547,848.100 μs 44,683.6720 μs 2,449.2628 μs 1.00 0.00 - 94 B 416344 B 1.00
PowerOfTen \pr\corerun.exe 1000000 106,203.493 μs 35,159.8003 μs 1,927.2273 μs 0.02 0.00 - 94 B 415486 B 1.00
PowerOfTenPlusOne \main\corerun.exe 1000 5.631 μs 0.2055 μs 0.0113 μs 1.00 0.00 0.0305 3,049 B 440 B 1.00
PowerOfTenPlusOne \pr\corerun.exe 1000 5.309 μs 0.1675 μs 0.0092 μs 0.94 0.00 0.0305 3,019 B 440 B 1.00
PowerOfTenPlusOne \main\corerun.exe 10000 287.339 μs 16.9327 μs 0.9281 μs 1.00 0.00 0.4883 3,017 B 8657 B 1.00
PowerOfTenPlusOne \pr\corerun.exe 10000 76.457 μs 8.4699 μs 0.4643 μs 0.27 0.00 0.2441 3,017 B 4184 B 0.48
PowerOfTenPlusOne \main\corerun.exe 100000 10,017.519 μs 2,451.2130 μs 134.3593 μs 1.00 0.00 - 3,016 B 86046 B 1.00
PowerOfTenPlusOne \pr\corerun.exe 100000 3,088.058 μs 581.6661 μs 31.8831 μs 0.31 0.00 - 3,017 B 41556 B 0.48
PowerOfTenPlusOne \main\corerun.exe 1000000 371,023.767 μs 47,955.9585 μs 2,628.6279 μs 1.00 0.00 - 94 B 860864 B 1.00
PowerOfTenPlusOne \pr\corerun.exe 1000000 117,715.373 μs 38,234.2942 μs 2,095.7507 μs 0.32 0.00 - 94 B 415429 B 0.48
PowerOfTenMinusOne \main\corerun.exe 1000 5.784 μs 0.7608 μs 0.0417 μs 1.00 0.00 0.0305 3,051 B 440 B 1.00
PowerOfTenMinusOne \pr\corerun.exe 1000 5.551 μs 0.5350 μs 0.0293 μs 0.96 0.01 0.0305 3,018 B 440 B 1.00
PowerOfTenMinusOne \main\corerun.exe 10000 296.562 μs 88.8331 μs 4.8692 μs 1.00 0.00 0.4883 3,017 B 8657 B 1.00
PowerOfTenMinusOne \pr\corerun.exe 10000 208.304 μs 79.8736 μs 4.3781 μs 0.70 0.02 0.2441 3,019 B 4184 B 0.48
PowerOfTenMinusOne \main\corerun.exe 100000 10,168.222 μs 1,159.4629 μs 63.5541 μs 1.00 0.00 - 3,016 B 86046 B 1.00
PowerOfTenMinusOne \pr\corerun.exe 100000 7,655.886 μs 1,894.1289 μs 103.8236 μs 0.75 0.01 - 3,017 B 41560 B 0.48
PowerOfTenMinusOne \main\corerun.exe 1000000 366,404.433 μs 47,445.9123 μs 2,600.6706 μs 1.00 0.00 - 94 B 860864 B 1.00
PowerOfTenMinusOne \pr\corerun.exe 1000000 263,400.400 μs 50,544.4774 μs 2,770.5134 μs 0.72 0.01 - 94 B 415808 B 0.48

@tannergooding tannergooding merged commit 7a5fb4f into dotnet:main Jul 3, 2024
83 checks passed
@kzrnm kzrnm deleted the BigIntegerParsePowOf10 branch July 4, 2024 00:58
@tannergooding
Copy link
Member

Just noting on the regressions above that many of them are for very small inputs, like 123. I've raised that these aren't "representative" of the cases we care about here since if users care about perf for numbers that are less than or equal to 128-bits, then we have existing dedicated types they can use.

Rather, we want to ensure that BigInteger really shines for values above this cutoff. -- There is notably some consideration that we also don't want to overly regress the perf of BigInteger values with 128 or less bits, just that its not as important. For a case like Parse we could likely have a fast path that does something like if (stringLength < 40) { ParseAsInt128 } else { ParseAsBigInt }, since any string with this few digits must fit into Int128 (and a corresponding cutoff for hex numbers, etc). This would likely fix the regression without hurting the improved cases.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Numerics community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants